# Agent Tools Developer Guide This guide covers how to add a new agent tool, how artifacts are laid out on disk, and how caching works for agent runs. ## Operational Model Annolid agent operations are split into two layers: - Self-improving: skills and memory evolve behavior without replacing installed code. - Self-updating: signed update workflow stages and applies software updates with rollback plans. ### Self-improving - Skills: loaded with precedence `workspace -> managed (~/.annolid/skills) -> bundled`. - Hot reload: controlled by `skills.load.watch` and `skills.load.pollSeconds`. - Skill manifest validation: frontmatter is validated at load time; invalid manifests are marked unavailable. - Workspace memory: daily notes in `memory/YYYY-MM-DD.md` and curated long-term notes in `memory/MEMORY.md`. - Pre-compaction flush: transcript snapshot can be appended before compaction via memory flush helpers. - Memory retrieval plugin: default is local semantic ranking with keyword fallback (`workspace_semantic_keyword_v1`). ### Self-updating - Channel-aware update manager supports `stable`, `beta`, and `dev`. - Pipeline: `preflight -> stage -> verify -> apply -> restart marker -> post-check`. - Rollback: rollback plan is generated for each run and executed on apply/post-check failures. - Canary policy: rollout can enforce rollback thresholds using sample count, failure-rate, and regression limits. - Safe update service: supports manifest check, artifact staging/download, checksum verification, signature verification, and transaction reporting. - Auto-update: disabled by default; configurable interval+jitter schedule when enabled (`ANNOLID_AUTO_UPDATE_*` env settings). - GUI controls: `AI Model Settings -> Agent Runtime` includes auto-update enable/channel/check-now/rollback and bot settings for skill hot reload, memory mode, and skill source locations. - Production safety policy: in production mode (`ANNOLID_PRODUCTION_MODE=1` or `ANNOLID_ENV=production`), signed update manifests and signed non-builtin skills are required. ## How to add a tool 1. **Define the tool** by extending the base class in `annolid/core/agent/tools/base.py`: - Implement `run(self, ctx, payload)` with your core logic. - Use `ctx.results_dir` and `ctx.run_id` to derive stable outputs. - Use `ctx.artifact_store` if you want to persist artifacts and participate in caching. 2. **Register the tool** in the registry: - Add a new tool wrapper in `annolid/core/agent/tools/`. - Export it from `annolid/core/agent/tools/__init__.py`. - Register it with `ToolRegistry` (see `annolid/core/agent/tools/registry.py`). 3. **Integrate with the runner** (Phase 4+): - Compose tools using the registry and a pipeline definition. - Ensure inputs/outputs follow the unified data models in `base.py`. 4. **Write a minimal test**: - Use tiny inputs and validate outputs. - Prefer tests under `tests/` that don’t require large external models. ## Artifact layout Artifacts are stored per video results directory and organized as: - `/` - `agent.ndjson` (default agent output) - `_000000000.json` + per-frame LabelMe JSON - `.agent_runs//` (run-scoped artifacts) - `.cache/agent_cache.json` (cache metadata for re-run reuse) The `FileArtifactStore` resolves paths relative to: - **Run artifacts**: `.agent_runs//...` - **Cache artifacts**: `.cache/...` See `annolid/core/agent/tools/artifacts.py` for helpers. ## Caching semantics Agent runs compute a **content hash** from: - video path + filesystem stats (size/mtime), - behavior spec (full schema), - run config (stride, max frames, etc.), - model identifiers, - output NDJSON name. If the cache hash matches and both the NDJSON and annotation store exist, the service returns cached results without re-running the agent. To disable reuse from the CLI, run: ``` annolid-run agent --no-cache ... ``` ## Citation management tools Annolid includes built-in BibTeX tooling for paper citation workflows: - CLI: - `annolid-run citations-list --bib-file refs.bib [--query ...]` - `annolid-run citations-upsert --bib-file refs.bib --key mykey --title ... --author ... --year ...` - `annolid-run citations-remove --bib-file refs.bib --key mykey` - `annolid-run citations-format --bib-file refs.bib` - Agent function tools: - `bibtex_list_entries` - `bibtex_upsert_entry` - `bibtex_remove_entry` - `gui_save_citation` (save from active PDF/web viewer context) Examples in Annolid Bot message input: - `save citation` - `list citations` - `list citations from references.bib for annolid` - `save citation from pdf as annolid2024 to references.bib` - `save citation from web` - `add citation @article{yang2024annolid, title={Annolid: Annotate, Segment, and Track Anything You Need}, author={Yang, Chen and Cleland, Thomas A}, journal={arXiv preprint arXiv:2403.18690}, year={2024}}` - `save citation from web with strict validation` - `save citation from pdf without validation` - `open threejs example two mice` - `open threejs example brain` - `open threejs html /tmp/annolid_threejs_examples/two_mice.html` - `open threejs https://example.org/viewer.html` Default behavior: - `save citation` first attempts Google Scholar BibTeX lookup from the active paper context, then falls back to Crossref/OpenAlex when needed, and saves the merged entry to `.bib`. GUI workflow: - In Annolid Bot input toolbar, click `📚` to open the citation manager. - Manage a `.bib` file, save citations from active PDF/web context, choose auto-validation or strict mode, view/edit a `Source` column (URL or PDF path), edit rows inline with year/DOI checks, and remove selected entries. See also: `docs/source/citations_tutorial.md` for a full user tutorial. ## Operator Commands Use `annolid-run` commands for routine operations: - `annolid-run agent skills refresh [--workspace ]` - `annolid-run agent skills inspect [--workspace ]` - `annolid-run agent memory flush [--workspace ] [--session-id ] [--note ]` - `annolid-run agent memory inspect [--workspace ]` - `annolid-run agent eval run --traces --candidate-responses --out ` - `annolid-run agent eval build-regression --workspace --out [--min-abs-rating 1]` - `annolid-run agent eval gate --changed-files --report [--max-regressions 0] [--min-pass-rate 0.0]` - `annolid-run agent feedback add --workspace --rating -1|0|1 [--trace-id ] [--comment ] [--expected-substring ]` - `annolid-run update check --channel stable|beta|dev [--require-signature]` - `annolid-run update run --channel stable|beta|dev [--execute] [--require-signature] [--skip-post-check] [--canary-metrics ]` - `annolid-run update rollback --install-mode package|source --previous-version [--execute]` ### Admin Function APIs The agent runtime also exposes operator-style function tools: - `skills.refresh` - `memory.flush` - `eval.run` - `update.run` - `update.run` requires explicit operator consent phrase for `execute=true`: `APPROVE_ANNOLID_CORE_UPDATE` (override with `ANNOLID_OPERATOR_UPDATE_CONSENT_PHRASE`). ## Shell Session Tools For OpenClaw-style shell lifecycle workflows, Annolid now provides session tools: - `exec_start(command, working_dir?, background?, timeout_s?, pty?)` - `exec_process(action, session_id?, wait_ms?, tail_lines?, text?, submit?)` Supported `exec_process.action` values: - `list`, `poll`, `log`, `write`, `submit`, `kill` Notes: - `pty` is accepted but currently not enabled (`pty_supported=false` in responses). - Basic dangerous command patterns are blocked at start time. - Runtime policy group `group:runtime` now includes `exec`, `exec_start`, and `exec_process`. ## Improvement Quality Loop - Anonymized run traces: `workspace/eval/run_traces.ndjson` captures hashed session/channel/chat IDs and redacted text previews. - Explicit user feedback: `workspace/eval/feedback.ndjson` stores rating/comment/optional expected substring for promotion signals. - Regression dataset build: combines traces + feedback into eval traces for CI and pre-promotion checks. - Shadow mode: enable `ANNOLID_AGENT_SHADOW_MODE=1` to log alternative routing decisions to `workspace/eval/shadow_routing.ndjson`. use `annolid-run agent skills shadow --candidate-pack ` to compare candidate skill packs before promotion. ## Governance and Audit Governance events are stored as NDJSON with default path: - `~/.annolid/governance/events.ndjson` You can override it with: - `ANNOLID_GOVERNANCE_EVENTS_PATH=/custom/path/events.ndjson` Audited event categories include skill snapshot/refresh changes, memory writes/flushes, update stage/run actions, and rollback outcomes. ## Three.js bot tools Annolid Bot supports direct Three.js viewer control in GUI sessions. - Function tools: - `gui_open_threejs(path_or_url)` - `gui_open_threejs_example(example_id)` - Built-in example IDs: - `two_mice_html` (default) - `brain_viewer_html` - `helix_points_csv` - `wave_surface_obj` - `sphere_points_ply` The bot recognizes natural-language commands such as `open threejs example ...`. ## Browser Automation Safety Annolid supports MCP browser automation with both granular tools and a unified tool: - `mcp_browser` (single control surface with actions: `status|start|stop|navigate|snapshot|screenshot|act|wait`) - `mcp_browser_navigate`, `mcp_browser_click`, `mcp_browser_type`, etc. Navigation hardening: - browser navigation allows `http://`, `https://`, and `about:blank` only. - unsafe schemes such as `file://`, `javascript:`, and `data:` are blocked. - GUI `open_url` also blocks `file://`; use an explicit local file path instead. ## Annolid code/docs Q&A and tutorials Annolid Bot is optimized to answer Annolid-specific questions from local docs and code context. - It can explain modules, workflows, and settings with file-path references. - It can generate on-demand tutorials for requested topics and levels using the active chat model, grounded by Annolid docs/code evidence. - When a tutorial is saved to Markdown, Annolid Bot auto-opens the generated `.md` in the embedded web viewer. - Direct command examples: - `create on demand tutorial for realtime camera setup in annolid` - `create beginner tutorial for behavior analysis and save to markdown file` - `how do i use annolid for behavior analysis` ## Realtime camera snapshot + email Annolid Bot can capture a snapshot from a camera stream and send it by email. - Stream snapshot: - GUI sessions: use `gui_check_stream_source` with `save_snapshot=true`. - This GUI tool now runs a full camera mission pipeline: - `probe -> capture -> annotate -> notify/email` - returns explicit `camera_mission.steps` and `delivery` status objects. - Non-GUI channels (for example email/IM): use `camera_snapshot`. - Snapshot files are saved under `.annolid/workspace/camera_snapshots/`. - Outlook Safe Links camera URLs are automatically unwrapped to the original stream URL. - Source fallback policy is intent-aware: - eye-blink intent defaults to camera `0` - network camera intent prefers remembered network streams. - Email with attachments: - Use the `email` tool with: - `to` - `subject` - `content` - optional `attachment_paths` (list of local file paths) Example bot intent: - `check wireless camera, save a snapshot, and email it to user@example.com` Realtime email/report spam control: - Realtime bot report interval controls report cadence. - Email requests use an additional minimum interval (`bot_email_min_interval_sec`, default `60s`) to avoid repeated email requests. ## Security and policy hardening (Phase 2) Adds stricter defaults for tool access and data handling: - Capability-oriented tool profiles: - `gui`, `email`, `realtime`, `filesystem` - explicit capability expressions are supported, for example: - `capability:gui,email` - `capability:gui+realtime` - Snapshot path hardening: - `camera_snapshot` writes only under workspace `camera_snapshots/`. - symlink escape paths are rejected. - Redaction-at-source: - private/local stream endpoints are redacted in outbound content. - sensitive metadata keys (for example `peer_id`, `account_id`) are redacted before publish. - Runtime high-risk guard: - deny-by-default blocks risky multi-tool chains unless explicit intent is provided. - config toggle: `agents.defaults.strict_runtime_tool_guard` (default `true`). Example config: ```json { "agents": { "defaults": { "strict_runtime_tool_guard": true } } } ``` Explicit high-risk intent markers supported by policy/runtime guards: - `intent:high-risk` - `intent:high_risk` - `allow:high-risk` - `allow_high_risk` - `unsafe:high-risk` ## Session memory and replay Annolid agent sessions now keep separated memory layers and replayable event logs. - Working memory: - short-horizon session summary derived from recent user/assistant turns. - stored in session metadata as `working_memory`. - bounded by a character quota in `PersistentSessionStore`. - Long-term memory: - stable facts/notes derived from session facts and consolidation updates. - stored in session metadata as `long_term_memory`. - bounded by a character quota in `PersistentSessionStore`. ### Deterministic consolidation and telemetry Memory consolidation now uses deterministic triggers based on: - session turn counter (`turn_counter`) - next scheduled consolidation turn (`next_consolidation_turn`) - history length relative to memory window Telemetry is persisted in session metadata as `memory_telemetry` with entries like: - `timestamp` - `outcome` (for example `llm_consolidated`, `skipped_short_transcript`, `not_due`) - `history_len`, `archive_len`, `keep_len` - `elapsed_ms` ### Memory mutation audit trail Session metadata contains `memory_audit_trail` entries for memory changes, including: - `timestamp` - `scope` (`facts`, `working_memory`, `long_term_memory`) - `mutation` (for example `set_fact`, `set_working_memory`) - `reason` - `turn_id` - `before_chars` / `after_chars` ### Safe replay for debugging Session event records are stored in metadata key `event_log`. - Each entry includes: - `timestamp` - `direction` (`inbound`/`outbound`) - `kind` (for example `user`, `assistant`, `progress`, `final`) - optional `turn_id`, `event_id`, `idempotency_key` - `payload` GUI/backend helpers: - `replay_session_debug_events(session_store=..., session_id=..., direction=\"\", limit=200)` - `format_replay_as_text(events)` These helpers are implemented in: - `annolid/core/agent/gui_backend/session_io.py`